TLB map setup per chip #179

broskoTT · 2024-10-17T13:20:14Z

Currently tt_SiliconDevice accepts one map for everything. But that goes against our effort to separate cluster vs chip responsibilities, related to #157
This change includes:

Adding chip_id to setup_core_to_tlb_map. There should be a map per mmio chip
Started a chip api tests file
Added an api example/test on how TLB setup functions at the moment. This also does some minor configuration testing, whether it throws or not.
Minor cosmetic changes to other API tests

Contributes to #154 since it adds more api tests.
This change will require tt_metal changes.

Breaks existing usages of setup_core_to_tlb_map
tt_metal corresponding PR: Change tt_cluster to use single tt_SiliconDevice tt-metal#13949
tt_debuda change: Not used

joelsmithTT · 2024-10-17T14:15:04Z

I will give this PR a more thorough review, but my initial reaction is that the relationship between TLB index and (x,y) location is something UMD should hide.

UMD has to know about the TLB windows, there's no way around that. But the application does not. It shouldn't matter what TLB is mapped where as long as UMD/KMD manage these resources effectively.

broskoTT · 2024-10-18T07:20:21Z

I will give this PR a more thorough review, but my initial reaction is that the relationship between TLB index and (x,y) location is something UMD should hide.

UMD has to know about the TLB windows, there's no way around that. But the application does not. It shouldn't matter what TLB is mapped where as long as UMD/KMD manage these resources effectively.

I agree, but bear in mind that this is not what this PR is about. This is not a final API, it is just pointing in the right direction, that TLB setup is chip specific. We might end up removing it altogether in the end

joelsmithTT

I understand the intent behind the change, but it expands an existing pattern that I regard as a core UMD design flaw: the pervasive use of hash tables for per-chip information lookup. This pattern - which exists in UMD independent of this change - undermines performance and has led to workarounds (get_fast_pcie_static_tlb_write_callable and tt::Writer).

As an interim change along a pathway to a better API, I have no objection.

device/tt_device.h

device/tt_silicon_driver.cpp

broskoTT · 2024-10-18T19:34:22Z

undermines performance and has led to workarounds (get_fast_pcie_static_tlb_write_callable and tt::Writer).

I completely agree. But it is fine for me to have a more flexible API for slower access, and a performant option (such as tt::Writer, or as you suggested TLBWindow, which I ended up suggesting be AbstractIO in the end).

On the other hand I do want to investigate maps that we use a bit. Theoretically, it shouldn't be expensive to use a hash map. It is known to have a bad implementation in stdlib though, there are much better options. I also suspect that just changing everything to map and set would improve perf (they should be more performant for very small containers, which are ours). Hash maps are not slow by design, and I'd very much like to get to the bottom of where the perf issue really is.

broskoTT · 2024-10-22T09:45:16Z

Related tenstorrent/tt-metal#13949 got approved, so now merging this PR.

…n it shouldn't

broskoTT requested review from joelsmithTT and pjanevskiTT October 17, 2024 13:20

broskoTT added the changes api API changing PR, needs changes in client code label Oct 18, 2024

broskoTT mentioned this pull request Oct 18, 2024

Change tt_cluster to use single tt_SiliconDevice tenstorrent/tt-metal#13949

Open

5 tasks

joelsmithTT approved these changes Oct 18, 2024

View reviewed changes

device/tt_device.h Outdated Show resolved Hide resolved

device/tt_device.h Show resolved Hide resolved

device/tt_silicon_driver.cpp Show resolved Hide resolved

pjanevskiTT approved these changes Oct 21, 2024

View reviewed changes

broskoTT added 12 commits October 22, 2024 09:46

tlb_map per core

d720fe1

add api test for tlbs

713ab16

fix existing tests

b694e0f

minor compilation fix in tt_silicondevice

266fcb4

test in apitests that getting tlb fails when expected and doesn't whe…

44c7aaa

…n it shouldn't

minor grayskull fix

471e35a

attempt to fix GS

57dbd23

fix gs

d92846e

tlb_init_per_chip

12b7f91

minor test cosmetic change

e1ba469

address comments

a791400

minor apitest->apiclustertest

a4e6177

broskoTT force-pushed the brosko/tlb_map_per_chip branch from 40710a0 to a4e6177 Compare October 22, 2024 09:47

Merge branch 'main' into brosko/tlb_map_per_chip

5025fdb

broskoTT merged commit b473c5d into main Oct 23, 2024
17 checks passed

broskoTT deleted the brosko/tlb_map_per_chip branch October 23, 2024 09:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TLB map setup per chip #179

TLB map setup per chip #179

broskoTT commented Oct 17, 2024 •

edited

Loading

joelsmithTT commented Oct 17, 2024

broskoTT commented Oct 18, 2024

joelsmithTT left a comment

broskoTT commented Oct 18, 2024

broskoTT commented Oct 22, 2024

TLB map setup per chip #179

TLB map setup per chip #179

Conversation

broskoTT commented Oct 17, 2024 • edited Loading

joelsmithTT commented Oct 17, 2024

broskoTT commented Oct 18, 2024

joelsmithTT left a comment

Choose a reason for hiding this comment

broskoTT commented Oct 18, 2024

broskoTT commented Oct 22, 2024

broskoTT commented Oct 17, 2024 •

edited

Loading